Do Test Formats in Reading 
Comprehension Affect Second-Language 
Students' Test Performance Differently? 

Ying Zheng, Liying Cheng, and Don A. Klinger 


Large-scale testing in English affects second-language students not only greatly 
but also differently than first-language learners. The research literature reports 
that confounding factors in such large-scale testing such as varying test formats 
may differentially affect the performance of students from diverse backgrounds. 
An investigation of test performance between ESL/ELD students and non- 
ESL/ELD students on the Ontario Secondary School Literacy Test (OSSLT) was 
performed to investigate whether test formats in reading comprehension affected 
the two groups differently. The residts indicate that the overall pattern of difficul- 
ty levels on the three test formats were the same between ESL/ELD students and 
non-ESL/ELD students, except that ESL/ELD students performed substantially 
lower on each format and that more variability was found among ESL/ELD 
students. Further, discriminant analysis results indicated that only the multiple- 
choice questions obtained a significant discriminant coefficient in differentiating 
the two groups. The results suggest a lack of association between test formats and 
test performance. 

L'effet qu’ont les evaluations a grande echelle en anglais sur les eleves en langue 
seconde n’est pas seulement important, il est egalement different de celui qu'elles 
ont sur les eleves en langue premiere. La recherche indique que dans les evalua- 
tions a grande echelle, les variables confusionnelles telles que les formats varies 
peuvent ne pas avoir le meme effet sur des eleves d'origines differentes. La 
performance d' eleves en anglais langue seconde/developpement de la langue an- 
glaise (ESL/ELD) au test d’aptitude a lire et a ecrire au secondaire de VOntario a 
ete comparee a celle d' eleves qui n'etaient pas dans le programme ESL/ELD pour 
determiner si le format de Y evaluation de la comprehension a Vecrit avait le meme 
effet sur les deux groupes. Les resultats indiquent que la performance globale en 
termes de niveaux de difficult e aux trois formats de test etait semblable pour les 
deux groupes. Toutefois, la performance des eleves ESL/ELD etait sensiblement 
inferieure et plus variable pour chaque format. De plus, line analyse discrimi- 
nante a revele que seules les questions a choix multiples donnent une fonction 
discriminante significative lors de la comparaison des deux groupes. Les resultats 
donnent a penser qu’il n'y a pas de lien entre le format des tests et les perfor- 
mances des eleves. 
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Introduction 

Research in language testing has pointed out that test-takers with different 
characteristics might be affected by a test in ways that are not relevant to the 
abilities being tested (Bachman, 1990; Kunnan, 1998). Test format has been 
shown to be an important facet that could influence different test-takers' test 
performance (Bachman & Palmer, 1982; Shohamy, 1984, 1997). The issue of 
test format differences has been the subject of debate because it is generally 
assumed that different test formats elicit different levels of skills or abilities; 
therefore, such tests are subject to having different effects on test-takers from 
various linguistic and cultural backgrounds. Kunnan (2004) raised the issue 
of test fairness, arguing that certain test formats may favor some groups of 
test-takers but not others, threatening the validity of a particular test. 
Shohamy (1997) claimed that language tests employing test methods that are 
unfair to different groups of test-takers are unethical. If group performance 
differences do exist, the reason should be real differences in the skills or 
abilities being tested instead of confounding variables such as test formats 
(Elder, 1997). 

The present study aimed to determine whether test formats in reading 
comprehension on the Ontario Secondary School Literacy Test (OSSLT) af- 
fected English as a Second Language (ESL) and English Literacy Develop- 
ment (ELD) students differently than their non-ESL/ELD counterparts. ESL 
students are defined in the Ontario curriculum as students whose first lan- 
guage is not English, but who have received educational experience in their 
own countries using their first language. ELD students are those who are 
from countries or regions where access to education may have been limited 
and who have had few opportunities to develop literacy skills in any lan- 
guage (Ministry of Education and Training, 1999). ESL and ELD students are 
students who are identified by their school as ESL/ELD learners and who are 
recommended to take ESL and/or ELD courses. These students are also 
referred to as second-language students in this article. Unfortunately, infor- 
mation about the length of time these students had been in Canada, the level 
of their English proficiency, and any previous training experience for the 
OSSLT test or similar tasks was not available for this study. 

Research Background 

The OSSLT is a provincially mandated standardized test of English literacy. 
It is a graduation requirement for all Ontario secondary students in order to 
receive their secondary school diploma. Administered by the Education 
Quality and Accountability Office (EQAO), this test is designed to assess the 
literacy skills that students are expected to have learned in all subjects by the 
end of grade 9 in Ontario (ages 15-16). The test consists of two major com- 
ponents: writing and reading. In the writing component four types of writing 
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task are included: a summary, a series of paragraphs expressing an opinion, 
a news report, and an information paragraph. In the reading component are 
100 questions about 12 reading selections based on three types of texts: 
information (50%), consisting of explanation and opinion; graphic (25%), 
consisting of graphs, schedules and instructions; and narrative (25%), con- 
sisting of stories and dialogues. The students are expected to demonstrate the 
following three reading skills as required: understanding directly stated 
ideas and information; understanding indirectly stated ideas and informa- 
tion; and making connections between personal experiences and information 
in a reading selection (these terms are used in conformity with the EQAO 
terms). Finally, the comprehension questions employed to assess students' 
reading abilities are in three test formats: multiple-choice (MC) questions, 
constructed-response (CR) questions, and constructed-response questions 
with explanations (CRE, see Appendix 1 ). The CR questions require a short 
student response to the question. The CRE questions require a longer re- 
sponse, and students are not only expected to justify or explain the thinking 
behind their answers, but also to integrate personal knowledge and experi- 
ence to extend the meaning. The MC and CR items on the reading com- 
ponent are scored on a 2-point scale (0, 2), and the CRE items are scored 
using item-specific scoring rubrics on a 3-point scale (2 marks for correct, 1 
mark for partly correct, or 0 for incorrect). Reading abilities are defined by 
the EQAO in terms of reading with reasonable accuracy and proficiency in 
English: in other words, students are asked to connect relevant ideas and 
information so as to understand the meaning of the selected reading pas- 
sages and to demonstrate moderate success in integrating their personal 
knowledge and experience to extend the meaning (EQAO, 2002). 

EQAO reports of provincial results show that ESL/ELD students tend to 
fail the test and also to defer 2 writing the test at a far higher rate than the rest 
of the student population. For example, in October 2003 only 42% of the 
ESL/ELD students passed the whole test compared with an overall pass rate 
of 77%. About 45% of the ESL/ELD students passed reading only, and about 
69% of the students passed writing only (compared with the overall rates of 
82% and 88% respectively for all students who wrote the test). In terms of 
deferral rates, just over half of the ESL/ELD students (54%) participated in 
the test as compared with an overall participation rate of 91% (EQAO, 2003). 
Coupled with this higher deferral rate, the substantially higher failure rates 
of ESL/ELD students on the OSSLT suggest that this group of students is 
encountering great difficulty in meeting the graduation requirement. 

Research acknowledges the potential effect of large-scale testing, noting 
that it has brought about both intended and unintended consequences to 
diverse groups of students (Madaus & Clarke, 2001). Minority students, 
including second-language students (such as ESL/ELD students in Ontario), 
are among the most vulnerable to the effects of such large-scale testing 
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policies (Shepard, 1991). These tests also tend to have more severe conse- 
quences for minority students and students from poor families (Horn, 2003; 
Madaus & Clarke). Two reasons potentially account for the more adverse 
effects of the OSSLT on ESL/ELD students than on non-ESL/ELD students. 
First, in terms of measuring English literacy development, ESL/ELD stu- 
dents may not be in a position equal to that of their non-ESL/ELD counter- 
parts who have probably been part of the Canadian educational system for 
most if not all of their education and are likely as well to speak English as a 
first language. ESL/ELD students, however, may have been in the system 
only for a limited period before writing the OSSLT and are typically still 
struggling with the use of English as a second language. Researchers in 
second-language education suggest that four to eight years are required for 
ESL/ELD students to attain a level of language proficiency necessary to 
compete on a minimally competent level with their non-ESL/ELD counter- 
parts (Collier, 1989; Cummins, 1981; Roessingh, 1999), and if ESL/ELD stu- 
dents have not had the time and experience to attain competent levels of 
English language-learning, they will be more likely to fail the test, with 
potentially negative consequences for their future academic studies or other 
pursuits. Another reason why ESL/ELD students may be more adversely 
affected by tests is because these tests were originally designed for non- 
ESL/ELD students, that is, students whose first language is English. Cornell 
(1995) has argued that evaluation standards that heavily rely on English-lan- 
guage skills are established for mainstream students; these standards over- 
look non-mainstream students' individual language progress (including 
second-language students), resulting in failure on tests governed by such 
criteria. 

Literature Review of Test Format Effects 

To attain validity and fairness in tests, efforts need to be made to minimize 
irrelevant effects on test performance (e.g., test format effects) and to ex- 
amine if a given test measures the same construct across students with varied 
backgrounds (Bachman, 1990; Solano-Flores & Trumbull, 2003). Bachman 
emphasized the importance of research into test format effects on test perfor- 
mance, arguing that test developers could use information about interactions 
between test formats and test performance to help design tests that provide 
better and fairer measures of the language abilities that are of interest. 

Various test formats have been argued to elicit varied levels of skill or 
ability. The multiple-choice (MC) format won its popularity in test design 
due to its scoring efficiency and freedom from ambiguity (Gay, 1980), along 
with its being "economically practical" and allowing "reliable, objective 
scoring" (Wainer & Thissen, 1993, p. 103). Nevertheless, many studies 
criticize the MC format. For example, the MC format has been challenged as 
inadequate to fully assess the dimensions of cognitive performance because 
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MC items provide limited opportunity to demonstrate in-depth knowledge 
(Fitzgerald, 1978); this format "may emphasize recall rather than generation 
of answers" (Wainer & Thissen, p. 103). In addition, there is the possibility 
that test-wiseness will contaminate the measurement. Test-wiseness includes 
a variety of general strategies related to efficient test taking (Bachman, 1990). 
With respect to the MC format, the strategy of ruling out as many alterna- 
tives as possible and then guessing among those remaining may be con- 
sidered an example of test-wiseness. 

The constructed-response (CR) format (including short-answer ques- 
tions) is favored by some researchers and practitioners because it can mea- 
sure traits that cannot be tapped by the MC format: for example, assessing 
dynamic cognitive processes (Bennett, Ward, Rock, & Lahart, 1990). Such 
items are also believed to replicate more faithfully the tasks test-takers face in 
actual academic and work settings. Furthermore, CR questions are con- 
sidered to provide tasks that "may have more systemic validity" (Wainer & 
Thissen, 1993, p. 103). Because the CR format requires test-takers to construct 
their own answers, the assumption is that this format must involve higher- 
level thinking. But this idea has been challenged too. For example, Hancock 
(1994) investigated the comparative effectiveness of the MC and CR formats 
for assessing particular levels of complexity in the cognitive domain. He 
constructed examinations for two measurement classes with half MC and 
half CR questions. Equal numbers of questions in each format were written 
to reflect the first four levels of Bloom, Englehart, Furst, Hill, and 
Krathwohl's (1956) taxonomy. 3 Hancock's argument was that given sound 
test construction, MC questions were able to measure the same abilities as CR 
questions across the first four levels of Bloom et al.'s taxonomy. The results 
indicated a pattern of highly disattenuated correlations between multiple- 
choice and constructed-response questions across increasing cognitive 
levels. Hancock inferred that ensuring that MC questions tap higher cogni- 
tive levels requires test constructors to have the necessary skills to develop 
distracters that reflect the desired cognitive level. 

Based on the theoretical discussions above, earlier empirical studies have 
explored the influence of other test formats on students' performance. For 
example, Fitzgerald (1978) claimed that measurement procedures for 
evaluating the ability to comprehend written discourse (reading comprehen- 
sion) were a critical concern in education. He investigated the differential 
performance of students at three grade levels in two cultures (United States 
and Irish) using three test formats: multiple-choice cloze, maze, and cloze. 
The results indicated that students from the two countries produced sig- 
nificantly different scores at grade 3 and grade 4. Concerning levels of dif- 
ficulty for test formats, the study supported the assumption that MC 
questions would produce the highest student scores because they are recog- 
nition tasks. This result was upheld for both cultural groups. However, 
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cultural differences were also found. For example, the relatively higher 
scores on cloze items for Irish students were accounted for by an integrated 
program containing considerable creative writing. In contrast, the higher 
scores on MC cloze and maze items for US students were explained by their 
more skill-oriented and less integrated program. Thus it was concluded that 
the differential performance between the two cultural groups reflected char- 
acteristics of the educational programs in the cultures such as their different 
foci of orientation in developing reading skills or other sociolinguistic dif- 
ferences. 

Following the thread of Kintsch and Yarbrough's (1982) study on the 
effects of test formats and text structure on reading comprehension, 
Kobayashi (2002) investigated the relationship between students' test perfor- 
mance and the two other variables: text types and test formats. She tested 754 
college EFL (English as foreign language) students in Japan on four types of 
rhetorical organization: association, description, causation, and problem 
solution. Three test formats were employed: cloze, open-ended questions, 
and summary writing. Although the design of her study was challenged by 
some researchers (Chen, 2004), the results suggest that both text types and 
test formats had a significant effect on the EFL students' performance. 
Learners of different proficiency levels were differentially affected. Learners 
at higher English language ability were more susceptible to being influenced 
by different test formats. The results demonstrated that different test formats, 
including different types of questions in the same format, measured different 
aspects of reading comprehension. These findings also supported the con- 
cept of a "linguistic threshold" (Kobayashi, 2002, p. 210), according to which 
learners below a certain level of proficiency had difficulty understanding 
beyond sentence-level or literal understanding. Higher-proficiency learners, 
on the contrary, were more aware of overall text organization. 

Shohamy (1984) investigated the effect of different testing methods, levels 
of reading proficiency, and languages of assessment on L2 reading com- 
prehension by EFL readers. Using multiple-choice and open-ended ques- 
tions presented in both the participant's first (LI) and second languages (L2), 
she found that learners performed better on multiple-choice questions 
presented in their LI, and these effects were greater for students with low 
levels of reading proficiency. Her findings indicated significant effects on 
students' scores in reading comprehension for all three variables: testing 
method, text, and language. In addition, Riley and Lee (1996) compared the 
summary and the recall protocol for reading performance by two levels of 
early-stage L2 readers of French. They asked half the participants to read a 
passage and then write a summary of the passage, and asked the other half 
to read the passage and then recall it. Findings indicated a significant qualita- 
tive difference in performance by the two levels of readers on the two tasks. 
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The major focus of earlier studies has been on the effects of test formats on 
student performance or assigned levels of proficiency. Based on these earlier 
studies, the present study aims to examine test format effects on students 
from different language backgrounds: ESL/ELD students and non- 
ESL/ELD students, that is, students who mostly use English as a second 
language and students who mostly use English as a first language. Given 
that ESL/ELD students are presumed to have a lower level of English lan- 
guage proficiency, the hypothesis in this study is that there would be a 
greater performance gap between ESL/ELD students and non-ESL/ELD 
students if they were required to integrate personal knowledge and experi- 
ence to extend meaning in responses to CR and CRE questions. Two research 
questions guided this study. First, what are the performance patterns of 
ESL/ELD students on the three reading test formats compared with non- 
ESL/ELD students? Second, which test format(s) best distinguish(es) 
ESL/ELD students' performance from that of non-ESL/ELD students? 

Methodology 

Three sets of test data were obtained from the October 2003 administration of 
the OSSLT: (a) the 4,311 ESL/ELD students who wrote the test in October 
2003, (b) 5,000 non-ESL/ELD students who passed the test, and (c) 5,000 
non-ESL/ELD students who failed the test. From the non-ESL/ELD sample, 
a further random sample of students who either passed or failed the test 
were selected in conformity with the overall pass-fail ratio in the October 
2003 administration (23% fail, 77% pass). To better represent the overall 
pattern, 77% of the non-ESL/ELD students (3,834 cases) who passed and 
23% of the non-ESL /ELD students (1,169 cases) who failed were selected as a 
comparison with the ESL/ELD students («=4,311). This resulted in a non- 
ESL/ELD student sample of 5,003. 

Test scores on the reading component from the three formats — MC, CR, 
and CRE — were obtained from both student groups. There were 40 MC 
questions worth 80 marks, 35 CR questions worth 70 marks, and 25 CRE 
questions worth 50 marks. Descriptive statistics for the raw scores were first 
computed to determine the general patterns of ESL/ELD and non-ESL/ELD 
students' test performance. At the same time, other indicators — for example, 
standard deviation (SD), skewness, and kurtosis 4 — were obtained. Dis- 
criminant analyses were then performed to determine which format(s) could 
be used to distinguish the two groups. Discriminant analysis is most com- 
monly used to classify cases into two or more groups based on various 
characteristics of cases and to predict group membership for new cases the 
group membership of which is undetermined (Norusis, 1988). The dis- 
criminant equation is D=a+biXi+b 2 X 2 +...biXi, in which Xi represents each 
independent variable, bj the corresponding coefficient estimated from the 
data, and D the predicted group membership. The resulting coefficients 
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provide the maximum separation among the groups. In this case the inde- 
pendent variables were MC scores, CR scores, and CRE scores, and D was 
predicted ESL/ELD membership. Subsequent correlation analyses were con- 
ducted to check if multicollinearity among the three test formats was a 
concern: that is, if MC scores, CR scores and CRE scores were highly corre- 
lated (.90 or above). Finally, classification results were obtained to demon- 
strate how well the discriminant functions differentiated the ESL/ELD 
students from non-ESL/ELD students. 

Results 

Descriptive analysis. The descriptive results show that both the ESL/ELD 
students and the non-ESL/ELD students obtained their highest mean scores 
in the MC questions (see Table 1): the correct percentage was 59.8% for 
ESL/ELD students and 74.1% for non-ESL/ELD students. Also, both groups 
obtained slightly lower scores on the CR questions: 58.2% for the ESL/ELD 
students and 72.7% for the non-ESL/ELD students. And both groups ob- 
tained their lowest correct percentage scores among the three formats on the 
CRE questions: 51.5% for ESL/ELD students and 65.2% for non-ESL/ELD 
students. The average differences on the three formats between the ESL/ELD 
students and the non-ESL/ELD students were 14.38% for the MC questions, 
14.46% for the CR questions, and 13.68% for the CRE questions, indicating 
that the differences in performance were similar on the three test formats. 

However, as indicated by the standard deviations (SD), the scores of the 
ESL/ELD students were more varied on the CR and CRE test formats than 
for non-ESL/ELD students. The standard deviations of ESL/ ELD students 
on these two formats were 13.80 and 9.85, compared with 12.54 and 9.05 for 
non-ESL/ELD students. An examination of the skewness of the groups' 
scores demonstrated that the non-ESL/ELD students' performance was 


Table 1 

Descriptive Statistics of Reading Test Formats 



Mean (%) 

SD 

Skewness 

Kurtosis 

ESL/ELD students 





(n= 4,311) 
MC (80) 

47.87 (59.8%) 

12.26 

-.14 

-.40 

CR (70) 

40.77 (58.2%) 

13.80 

-.49 

-.34 

CRE (50) 

25.77 (51.5%) 

9.85 

-.33 

-.48 

Non-ESL/ELD students 





(n= 5,003) 
MC 

59.30/80 (74.1%) 

12.46 

-.84 

.26 

CR 

50.89/70 (72.7%) 

12.54 

-1.11 

.99 

CRE 

32.61/50 (65.2%) 

9.05 

-.86 

.38 
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more negatively skewed regardless of format, indicating that non-ESL/ELD 
students' scores were more shifted to the higher end of the score distribution 
(higher scores). By examining the kurtosis value, it was found that the 
ESL/ELD students had negative kurtosis in all three formats (i.e., a flat 
distribution), indicating more spread in their scores, as opposed to the posi- 
tive kurtosis obtained by the non-ESL/ELD students (i.e., a peaked distribu- 
tion). 

Discriminant analysis. Discriminant analysis was conducted to examine 
which test format had a better discriminating effect between ESL/ELD stu- 
dents and non-ESL/ELD students. The results show that only the MC test 
format was a significant predictor of group membership (see Table 2). The 
other two formats did not provide further significant separation between the 
two groups and are thus excluded from Table 2. 

Large eigenvalues (relative proportion of variance contributed by each 
predictor) represent better discriminant functions. In other words, the ratio 
of the between-groups sum of squares to the within-groups sum of squares 
should be a maximum; in the current output the eigenvalue was .21, which 
was relatively small. The square of the canonical correlation (multiple cor- 
relations between predictors and groups) (0.42) and the difference in the 
value of Wilk's lambda from 1 (an index used to test the significance of the 
discriminant function) indicate that only 18% of the variance was associated 
with the differences between groups (see Table 2). Although the MC format 
provided significant distinction between ESL/ELD students and non- 
ESL/ELD students, a large proportion of the total variance was attributable 
to the differences within groups. In sum, the low eigenvalue coupled with 
the relatively high Wilk's Lambda indicated that although significant, the 
MC format did not strongly differentiate ESL/ELD and non-ESL/ELD stu- 
dents. 

Follow-up correlational analyses revealed that the correlations among the 
three test formats — MC, CR, and CRE — were high (.84 between MC and CR; 
.80 between MC and CRE; .88 between CR and CRE). This explains why the 
other test formats were unable to discriminate group membership further in 
the presence of the MC results. Overall, the difference among test formats 


Table 2 

Discriminant Functions 


Standardized Canonical 
Discriminant Function 
Coefficients 

Wilk's 

Lambda 

Eigenvalue 

Canonical 

Correlation 

fl 2 

Sig. 

MC 

Score 1 .00 

.82 

.21 

.42 

0.18 

.001 
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Table 3 

Classification Results 


Predicted Group Membership Total 

ESUELD Non-ESUELD 


Count ESL/ELD 

2,785 

1,526 

4,311 

Non-ESL/ELD 

1,368 

3,635 

5,003 

% ESL/ELD 

64.60 

35.40 

100.0 

Non-ESL/ELD 

27.34 

72.66 

100.0 


Note. 68.63% of original grouped cases correctly classified. 


did not account for much of the variance between ESL/ELD students' and 
non-ESL/ELD students' OSSLT reading performance. 

Given these results, it is not surprising that the classification results (see 
Table 3) also demonstrated that with the current discriminant function, test 
formats did not prove to be good discriminators in separating ESL/ELD 
students' and non-ESL/ELD students' reading performance on the OSSLT. 
Only 64.60% ESL/ELD students (2,785 out of 4,311) were correctly classified 
into their correct group, 14.60% above the chance level. Hence over 35% of 
ESL/ELD students (1,526) could be mistakenly grouped as non-ESL/ELD 
students. Similarly, only 72.66% of non-ESL/ELD students (3,635 out of 
5,003) were correctly classified into their correct group, 22.66% above the 
chance level. Over 27% of non-ESL/ELD students (1,368) could be mistaken- 
ly classified as ESL/ELD students. 

Given the discriminant function above, 68.63% of the original cases were 
correctly grouped (18.63% better than the chance level). Together these 
results suggest that test format is only a weak predictor of ESL/ELD mem- 
bership. 

Discussion and Conclusion 

The results show that ESL/ELD students performed less well in all three test 
formats in the reading section than their non-ESL/ELD counterparts; how- 
ever, the general patterns of difficulty were the same between the two com- 
parison groups. Both groups achieved a higher percentage of correct answers 
in MC questions, lower in CR questions, and the lowest in CRE questions. 
These findings are partly supported by the literature; that is, MC questions 
are generally considered to be easier to answer correctly than CR questions 
or CRE questions (Fitzgerald, 1978; Shohamy, 1984). Students obtain higher 
achievement scores on MC questions than CR questions because MC re- 
quires "comprehension and selection," whereas CR requires "comprehen- 
sion and production" (Wolf, 1993, p. 481). Furthermore, MC questions are 
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usually regarded as conducive to test-wiseness (Bachman, 1990). Such 
strategies may have resulted in the higher scores obtained on the MC format 
compared with the CR and CRE formats in this study. Further evidence to 
support this was provided by Cheng and Gao (2002), who found that in 
doing MC questions on reading comprehension, even in the absence of the 
associated reading passages, EFL students achieved scores above the chance 
level. 

The finding that ESL/ELD students' performances were more varied than 
those of non-ESL/ELD students also has important implications. These 
ESL/ELD students, although all engaged in the English-language develop- 
ment process, vary considerably in their literacy achievement. Thus it is 
important that these results suggest that it may be necessary not to consider 
the ESL/ELD population as representing a homogeneous group, a common 
practice in school systems. In fact researchers and teachers may wish to pay 
more attention to examining individual differences among the ESL/ELD 
students instead of viewing them as single whole. 

The discrimination analysis results combined with the unsatisfactory 
classification results based on the discrimination functions indicate that test 
formats provide weak discriminating power in separating the performance 
of ESL/ELD students and non-ESL/ELD students on the OSSLT. Given the 
current discrimination function based on test formats, approximately one 
third of the students could not be assigned to their correct group. Combining 
the results of descriptive analysis and discriminant analysis indicates that 
there are large performance gaps between ESL/ELD students and non- 
ESL/ELD students, but these gaps cannot be strongly attributed to test 
format differences. Cheng, Klinger, and Zheng (2007) conducted a two-year 
cross-validation study of the OSSLT data. Their results showed that the 
discrimination effect regarding test formats was not consistent over the two 
years of the study. For the February 2002 data, CR questions best separated 
the two groups )3=.42, pc.OOl); MC questions had a discriminant coefficient of 
.34 (p<.001), and CRE questions had the lowest discriminant coefficient of .30 
(p<.001). For the 2003 data (which are the same data as in this study), only 
MC questions separated the two groups (3=1, p<.001). Also, Cheng et al. 
found that the performance differences were smaller in 2003 than in 2002. 
One possible explanation they offered was that the first test administration of 
the OSSLT had been in February 2002, whereas the October 2003 adminis- 
tration was the third. Thus the smaller performance differences could in part 
reflect the progress that ESL/ELD students had made or the extent to which 
these students had been coached for the test. Also, the smaller performance 
differences in 2003 might have led to the diminishing of the discrimination 
effects of the other two test formats, leaving only the MC format as a sig- 
nificant discriminator; therefore, test format did not provide a systematic 
separation between the ESL/ELD and non-ESL/ELD students. The dis- 
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criminating effects of test formats on test performance were significant yet 
weak, and the most difficult constructs (CRE questions) did not necessarily 
coincide with the best discriminator (MC questions) (Cheng et al.). 

It is worth noting that the initial hypothesis guiding the current study was 
not supported in the findings: ESL/ELD students did not display noticeable 
extra performance discrepancies in CR or CRE questions compared with MC 
questions. Although a systematic analysis of the actual OSSLT questions is 
necessary to gain a deeper understanding of this result, possible explana- 
tions are offered. The CR and CRE questions on the OSSLT might not have 
required students to employ deeper cognitive levels, synthesis for example, 
or apply sophisticated background knowledge (which would have placed 
ESL/ELD students at a disadvantage) to answer the CR and CRE questions 
correctly. Thus further investigation would be justified with respect to 
whether the actual CR and CRE items on the OSSLT support the following 
two arguments from the literature reviewed above: (a) constructed-response 
formats are more advanced in assessing dynamic cognitive processes, as they 
are capable of asking students to employ not only knowledge-level but also 
synthesis-level cognition (Bloom et al., 1956); and (b) constructed-response 
formats replicate more faithfully the tasks that test-takers face in academic 
and work settings (Bennett et al., 1990). Subsequent studies could combine 
the analysis of test performance with further analysis of how test questions 
are constructed and answered by these students, that is, the reasoning and 
cognitive processes behind their choices and answers on the test. 

Overall, the results of this study confirm that ESL/ELD students dis- 
played substantial performance discrepancies compared with non-ESL/ELD 
students. These discrepancies, however, are close across test formats. The 
implication of this finding is that when teachers are preparing ESL/ELD 
students for the OSSLT, less focus should be put on the test format issue. 
Instead, with ESL/ELD students taking this large-scale provincial test while 
developing their English proficiency and literacy competence, a great deal of 
the variance in performance difference appears to relate to other aspects such 
as reading skills, reading strategies, or text types of reading passages, as 
indicated in Cheng et al.'s (2007) recent study. Thus ESL teachers' classroom 
priority should be their students' overall literacy competence rather than 
attention to test formats. This would include helping students to develop 
better reading skills and strategies and familiarizing them with reading the 
text types on the test (e.g., information, narrative, graphic). 

Notes 

1 In the sample OSSLT booklet, questions 1 and 2 are MC questions, question 3 is a CR question, 
and questions 4 and 5 are CRE questions. 

2 A deferral is made in consultation with the student and parents or the adult student, and with 
the appropriate teaching staff, on the basis that the student would not be able to participate in the 
test even with accommodations (EQAO, 2006). 
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3 There are six cognitive levels in Bloom et al/s (1956) taxonomy: knowledge, comprehension, 
application, analysis, synthesis, and evaluation. 

4 SD is a measure of variability; the larger the SD, the bigger the variance of the examined variable 
among the groups. Skewness and kurtosis are two measures usually reported to reflect the 
normality of the data. Normally distributed data have a value of zero for both kurtosis and 
skewness. 
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Appendix: Sample OSSLT Reading Booklet 


Reading 


Get Rid of that T-shirt! 


A recent newspaper article pointed out that Canadians purchased t 

73.7 million T-shirta last year. The article went on to say t h a t the avenge 
North American owns 25 of them The T-shirt was praised as the 
favourite garment of the twentieth century, worn by men and women, 
young and old, rich and poor. Aa we begin a new century. I suggest we 
leave the old T-shtrt behind 

The first wearer* of an “undershirt** or a “work shut** In public were i 
making s rebellious statement, but it quickly became the accepted style. 
Eventually, we all began to wear underwear any w here and everywhere. 

In the 60s. hippies tie-dyed their T-shirts. In the 70s. punk rockers s 

■hrrdded. safety pinned and spray-painted them In the KQs, T-shirts 
became great democratic portable billboards — each shirt an editorial 
column or personal ad tcllmg others about the places the wearer has 
been, or the products, hands and politics the wearer supports or abhors. 

The moat recent trend seems to he toward slogans or messages that « 

are increasingly meaningless The best known examples are expensive 
T shim sporting only the name of the manufacturer it seems strange 
that people are now expressing themselves by broadcasting their support 
of a shirt manufacturer. I can't think of anything leas individualistic or 
leas attractive to wear in public 

The T-shirt is basically a formless, ugly garment What should happen s 
to the 25 T-shirts each of us is supposed to have? 1 suggest that we use 
them as rags for washing our 1 .7 cars. 
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Reading *9 


multiple choice (Circle the letter next to the best or most correct answer fur each question ) 

1. In this selection, the T-shirt is compared to 

A a slogan. 

B a product 

C a garment 

D a billboard. 

2. Which of the following is the best way to describe the purpose of this selection? 

F to state an opinion 

6 to describe a product 

H in present information 

J to provide instructions 

written answers 

3. What is the meaning of the phrase "broadcasting their support of a shirt manufacturer" as 
used m paragraph 4? 


4. explain the purpose of the question tn paragraph 5. 


S. Do you think T-shirts will continue to be popular? Usa one piece of information from thu 
selection to support your answer. 
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